58 research outputs found

    Vocalisation Repertoire at the End of the First Year of Life: An Exploratory Comparison of Rett Syndrome and Typical Development

    Get PDF
    Open access funding provided by Medical University of Graz. This study was supported by the Austrian Science Fund (FWF; P25241, KLI811, and TCS24), the Austrian National Bank (OeNB; P16430), and Rett Deutschland e.V.Rett syndrome (RTT) is a rare, late detected developmental disorder associated with severe deficits in the speech-language domain. Despite a few reports about atypicalities in the speech-language development of infants and toddlers with RTT, a detailed analysis of the pre-linguistic vocalisation repertoire of infants with RTT is yet missing. Based on home video recordings, we analysed the vocalisations between 9 and 11 months of age of three female infants with typical RTT and compared them to three age-matched typically developing (TD) female controls. The video material of the infants had a total duration of 424 min with 1655 infant vocalisations. For each month, we (1) calculated the infants’ canonical babbling ratios with CBRUTTER, i.e., the ratio of number of utterances containing canonical syllables to total number of utterances, and (2) classified their pre-linguistic vocalisations in three non-canonical and four canonical vocalisation subtypes. All infants achieved the milestone of canonical babbling at 9 months of age according to their canonical babbling ratios, i.e. CBRUTTER ≥ 0.15. We revealed overall lower CBRsUTTER and a lower proportion of canonical pre-linguistic vocalisations consisting of well-formed sounds that could serve as parts of target-language words for the RTT group compared to the TD group. Further studies with more data from individuals with RTT are needed to study the atypicalities in the pre-linguistic vocalisation repertoire which may portend the later deficits in spoken language that are characteristic features of RTT.Medical University of GrazAustrian Science Fund (FWF) P25241 KLI811 TCS24Austrian National Bank (OeNB) P16430Rett Deutschland e.V

    Is speech the new blood? Recent progress in AI-based disease detection from audio in a nutshell

    Get PDF
    In recent years, advancements in the field of artificial intelligence (AI) have impacted several areas of research and application. Besides more prominent examples like self-driving cars or media consumption algorithms, AI-based systems have further started to gain more and more popularity in the health care sector, however whilst being restrained by high requirements for accuracy, robustness, and explainability. Health-oriented AI research as a sub-field of digital health investigates a plethora of human-centered modalities. In this article, we address recent advances in the so far understudied but highly promising audio domain with a particular focus on speech data and present corresponding state-of-the-art technologies. Moreover, we give an excerpt of recent studies on the automatic audio-based detection of diseases ranging from acute and chronic respiratory diseases via psychiatric disorders to developmental disorders and neurodegenerative disorders. Our selection of presented literature shows that the recent success of deep learning methods in other fields of AI also more and more translates to the field of digital health, albeit expert-designed feature extractors and classical ML methodologies are still prominently used. Limiting factors, especially for speech-based disease detection systems, are related to the amount and diversity of available data, e. g., the number of patients and healthy controls as well as the underlying distribution of age, languages, and cultures. Finally, we contextualize and outline application scenarios of speech-based disease detection systems as supportive tools for health-care professionals under ethical consideration of privacy protection and faulty prediction

    The acoustic dissection of cough: diving into machine listening-based COVID-19 analysis and detection

    Get PDF
    OBJECTIVES: The coronavirus disease 2019 (COVID-19) has caused a crisis worldwide. Amounts of efforts have been made to prevent and control COVID-19′s transmission, from early screenings to vaccinations and treatments. Recently, due to the spring up of many automatic disease recognition applications based on machine listening techniques, it would be fast and cheap to detect COVID-19 from recordings of cough, a key symptom of COVID-19. To date, knowledge of the acoustic characteristics of COVID-19 cough sounds is limited but would be essential for structuring effective and robust machine learning models. The present study aims to explore acoustic features for distinguishing COVID-19 positive individuals from COVID-19 negative ones based on their cough sounds. METHODS: By applying conventional inferential statistics, we analyze the acoustic correlates of COVID-19 cough sounds based on the ComParE feature set, i.e., a standardized set of 6,373 acoustic higher-level features. Furthermore, we train automatic COVID-19 detection models with machine learning methods and explore the latent features by evaluating the contribution of all features to the COVID-19 status predictions. RESULTS: The experimental results demonstrate that a set of acoustic parameters of cough sounds, e.g., statistical functionals of the root mean square energy and Mel-frequency cepstral coefficients, bear essential acoustic information in terms of effect sizes for the differentiation between COVID-19 positive and COVID-19 negative cough samples. Our general automatic COVID-19 detection model performs significantly above chance level, i.e., at an unweighted average recall (UAR) of 0.632, on a data set consisting of 1,411 cough samples (COVID-19 positive/negative: 210/1,201). CONCLUSIONS: Based on the acoustic correlates analysis on the ComParE feature set and the feature analysis in the effective COVID-19 detection approach, we find that several acoustic features that show higher effects in conventional group difference testing are also higher weighted in the machine learning models

    Vocalisation repertoire at the end of the first year of life: an exploratory comparison of Rett syndrome and typical development

    Get PDF
    Rett syndrome (RTT) is a rare, late detected developmental disorder associated with severe deficits in the speech-language domain. Despite a few reports about atypicalities in the speech-language development of infants and toddlers with RTT, a detailed analysis of the pre-linguistic vocalisation repertoire of infants with RTT is yet missing. Based on home video recordings, we analysed the vocalisations between 9 and 11 months of age of three female infants with typical RTT and compared them to three age-matched typically developing (TD) female controls. The video material of the infants had a total duration of 424 min with 1655 infant vocalisations. For each month, we (1) calculated the infants’ canonical babbling ratios with CBR(UTTER), i.e., the ratio of number of utterances containing canonical syllables to total number of utterances, and (2) classified their pre-linguistic vocalisations in three non-canonical and four canonical vocalisation subtypes. All infants achieved the milestone of canonical babbling at 9 months of age according to their canonical babbling ratios, i.e. CBR(UTTER) ≥ 0.15. We revealed overall lower CBRs(UTTER) and a lower proportion of canonical pre-linguistic vocalisations consisting of well-formed sounds that could serve as parts of target-language words for the RTT group compared to the TD group. Further studies with more data from individuals with RTT are needed to study the atypicalities in the pre-linguistic vocalisation repertoire which may portend the later deficits in spoken language that are characteristic features of RTT

    Automatic vocalisation-based detection of fragile X syndrome and Rett syndrome

    Get PDF
    Fragile X syndrome (FXS) and Rett syndrome (RTT) are developmental disorders currently not diagnosed before toddlerhood. Even though speech-language deficits are among the key symptoms of both conditions, little is known about infant vocalisation acoustics for an automatic earlier identification of affected individuals. To bridge this gap, we applied intelligent audio analysis methodology to a compact dataset of 4454 home-recorded vocalisations of 3 individuals with FXS and 3 individuals with RTT aged 6 to 11 months, as well as 6 age- and gender-matched typically developing controls (TD). On the basis of a standardised set of 88 acoustic features, we trained linear kernel support vector machines to evaluate the feasibility of automatic classification of (a) FXS vs TD, (b) RTT vs TD, (c) atypical development (FXS+RTT) vs TD, and (d) FXS vs RTT vs TD. In paradigms (a)–(c), all infants were correctly classified; in paradigm (d), 9 of 12 were so. Spectral/cepstral and energy-related features were most relevant for classification across all paradigms. Despite the small sample size, this study reveals new insights into early vocalisation characteristics in FXS and RTT, and provides technical underpinnings for a future earlier identification of affected individuals, enabling earlier intervention and family counselling

    The voice of COVID-19: acoustic correlates of infection in sustained vowels

    Get PDF
    COVID-19 is a global health crisis that has been affecting many aspects of our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. A considerable proportion of symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the very first time, the present study aims to investigate voice acoustic correlates of an infection with COVID-19 on the basis of a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /o:/, /u:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with the most prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in the front vowels /i:/ and /e:/ are additionally reflected in the variation of the fundamental frequency and the harmonics-to-noise ratio, group differences in back vowels /o:/ and /u:/ in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Findings of this study can be considered an important proof-of-concept contribution for a potential future voice-based identification of individuals infected with COVID-19.Comment: 8 page

    Evaluating the impact of voice activity detection on speech emotion recognition for autistic children

    Get PDF
    Individuals with autism are known to face challenges with emotion regulation, and express their affective states in a variety of ways. With this in mind, an increasing amount of research on automatic affect recognition from speech and other modalities has recently been presented to assist and provide support, as well as to improve understanding of autistic individuals' behaviours. As well as the emotion expressed from the voice, for autistic children the dynamics of verbal speech can be inconsistent and vary greatly amongst individuals. The current contribution outlines a voice activity detection (VAD) system specifically adapted to autistic children's vocalisations. The presented VAD system is a recurrent neural network (RNN) with long short-term memory (LSTM) cells. It is trained on 130 acoustic Low-Level Descriptors (LLDs) extracted from more than 17 h of audio recordings, which were richly annotated by experts in terms of perceived emotion as well as occurrence and type of vocalisations. The data consist of 25 English-speaking autistic children undertaking a structured, partly robot-assisted emotion-training activity and was collected as part of the DE-ENIGMA project. The VAD system is further utilised as a preprocessing step for a continuous speech emotion recognition (SER) task aiming to minimise the effects of potential confounding information, such as noise, silence, or non-child vocalisation. Its impact on the SER performance is compared to the impact of other VAD systems, including a general VAD system trained from the same data set, an out-of-the-box Web Real-Time Communication (WebRTC) VAD system, as well as the expert annotations. Our experiments show that the child VAD system achieves a lower performance than our general VAD system, trained under identical conditions, as we obtain receiver operating characteristic area under the curve (ROC-AUC) metrics of 0.662 and 0.850, respectively. The SER results show varying performances across valence and arousal depending on the utilised VAD system with a maximum concordance correlation coefficient (CCC) of 0.263 and a minimum root mean square error (RMSE) of 0.107. Although the performance of the SER models is generally low, the child VAD system can lead to slightly improved results compared to other VAD systems and in particular the VAD-less baseline, supporting the hypothesised importance of child VAD systems in the discussed context

    Changing the perspective on early development of Rett syndrome

    Get PDF
    We delineated the achievement of early speech-language milestones in 15 young children with Rett syndrome (MECP2 positive) in the first two years of life using retrospective video analysis. By contrast to the commonly accepted concept that these children are normal in the pre-regression period, we found markedly atypical development of speech-language capacities, suggesting a paradigm shift in the pathogenesis of Rett syndrome and a possible approach to its early detection
    corecore